Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable paged attention in varlen forward #831

Merged
merged 3 commits into from
Mar 15, 2024

Conversation

sgrigory
Copy link
Contributor

@sgrigory sgrigory commented Feb 16, 2024

Paged attention was added recently in 54e80a3, but only exposed through fwd_kvcache. This PR exposes it though varlen_fwd.
@bottler @danthe3rd

Tests:

pytest  tests/test_flash_attn.py -k test_flash_attn_varlen_causal 2>&1 | tee log.txt
collected 420068 items / 417188 deselected / 2880 selected

tests/test_flash_attn.py ............................................... [  1%]
........................................................................ [  4%]
...
........................................................................ [ 99%]
.........................                                                [100%]

=================== 2880 passed, 417188 deselected in 50.32s ===================

@sgrigory sgrigory changed the title [DRAFT] Enable paged attention in varlen forward Enable paged attention in varlen forward Feb 16, 2024
@sgrigory sgrigory marked this pull request as ready for review February 16, 2024 18:32
@rkooo567
Copy link

Is this PR planned to be merged? It'd be very nice to have this feature!

@tridao tridao merged commit 2a15840 into Dao-AILab:main Mar 15, 2024
@tridao
Copy link
Contributor

tridao commented Mar 15, 2024

Thank you @sgrigory!

@rkooo567
Copy link

So when using this feature, am I supposed to pass k_cache and v_cache to k, v, is this correct? (maybe the docstring has to be updated)

@rkooo567
Copy link

Also, I assume cu_seqlens_k is just ignored if block table is used?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants